Statistical Identification of English Loanwords in Korean Using Automatically Generated Training Data

نویسندگان

  • Kirk Baker
  • Chris Brew
چکیده

This paper describes an accurate, extensible method for automatically classifying unknown foreign words that requires minimal monolingual resources and no bilingual training data (which is often difficult to obtain for an arbitrary language pair). We use a small set of phonologically-based transliteration rules to generate a potentially unlimited amount of pseudo-data that can be used to train a classifier to distinguish etymological classes of actual words. We ran a series of experiments on identifying English loanwords in Korean, in order to explore the consequences of using pseudo-data in place of the original training data. Results show that a sufficient quantity of automatically generated training data, even produced by fairly low precision transliteration rules, can be used to train a classifier that performs within 0.3% of one trained on actual English loanwords (≈ 96% accuracy).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Varied adaptation patterns of English stops and fricatives in Korean loanwords: The influence of the P-map

In order to investigate to what extent perceptual factors affect the borrowing process, we examined the borrowing of English obstruents in Korean by comparing loanword adaptation patterns with the natives’ P-map (Steriade, 2001b). The orthographic classification technique was used to obtain the P-map (e.g., Wiik, 1965; Schmidt, 1996); 40 native Koreans were asked to choose the best matching Kor...

متن کامل

Perplexity of bi-phone phonotactic models in Korean loanword phonology

The paper presents a corpus study which shows that the probability distribution of bi-phones in a lexicon of Korean loanwords is significantly different from that in a typical Korean lexicon or a lexicon consisting solely of native Korean and Sino-Korean words. This is demonstrated by comparing the perplexity of two types of bi-phone phonotactic models: a model trained on a set of Korean loanwo...

متن کامل

Old vs. Young Koreans' Vowel Insertion after Word-final English and French Postvocalic Plosives: a Sociolinguistic Account

In a recent data survey, the comparison of the 2011 data to the early 1990s data for English and French loans in Korean adaptation has revealed that the overall frequency of final vowel insertion and that of variable insertion and/or no vowel insertion after the word-final postvocalic plosives [b d g p t k] are significantly decreased and increased, respectively, no matter whether the plosives ...

متن کامل

Phonetics versus phonology: English word final /s/ in Korean loanword phonology

In this paper we consider various perspectives on loanword phonology by examining the borrowing into Korean of English words having a word-final /s/. These have been borrowed into Korean with a tense [ ] followed by an epenthetic vowel, as illustrated by the borrowing of English bus as [ 3 ]. The realization of English word-final /s/ as [ ] is apparently unexpected given that English [s] and Ko...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008